{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Holt-Winters Demo\n",
    "\n",
    "Holt-Winters is a time-series analysis technique, used in both forecasting future entries in a time series as well as in providing exponential smoothing, where weights are assigned against historical data with exponentially decreasing impact. It does this by analyzing three components of the data: level, trend, and seasonality. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input.\n",
    "\n",
    "For information on cuDF, refer to the cuDF documentation: https://docs.rapids.ai/api/cudf/stable\n",
    "\n",
    "For information on cuML's Holt-Winters implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#holtwinters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "import pandas as pd\n",
    "import cudf as gd\n",
    "\n",
    "from sklearn.metrics import r2_score\n",
    "\n",
    "from cuml.tsa.holtwinters import ExponentialSmoothing as cuES\n",
    "from statsmodels.tsa.holtwinters import ExponentialSmoothing as smES"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_series = 500\n",
    "n_samples = 750"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Data\n",
    "\n",
    "To create a dataset on which to run Holt-Winters, we will artificially create additive time series by generating trend-components, seasonality-components, and noise-components, and taking the sum. Below we define our time series generator `get_timeseries_components`, which returns a tuple of randomly generated trend (with slope `m` and intercept `b`), season (with frequency `f` and amplitude `amp`), and noise (from a Gaussian distribution with scale `scale`) components of length `fs` -- by adding these parts together, we get a complete series on which to run Holt-Winters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Host"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def trend(x, m=1, b=0):\n",
    "    return m*x + b\n",
    "\n",
    "def sine_season(x, fs=100, f=2, amp=1):\n",
    "    return amp * np.sin(2*np.pi*f * (x/fs))\n",
    "\n",
    "def normal_noise(scale=1, size=1):\n",
    "    return np.random.normal(scale=scale, size=size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_timeseries_components(fs=100, f=4, m=1, b=0, amp=1, scale=1):\n",
    "    x = np.arange(fs)\n",
    "    t = trend(x, m, b)\n",
    "    s = sine_season(x, fs, f, amp)\n",
    "    n = normal_noise(scale, fs)\n",
    "    return (t, s, n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "np.random.seed(100)\n",
    "\n",
    "tt_split = int(0.8*n_samples)\n",
    "n_preds = n_samples - tt_split\n",
    "train_pdf = pd.DataFrame()\n",
    "test_pdf = pd.DataFrame()\n",
    "\n",
    "for i in range(n_series):\n",
    "    t, s, n = get_timeseries_components(fs=n_samples,\n",
    "                                        f=4,\n",
    "                                        m=np.random.uniform(-2, 2),\n",
    "                                        b=np.random.uniform(-3, 3),\n",
    "                                        amp=np.random.uniform(-1, 1),\n",
    "                                        scale=np.random.uniform(1, 3))\n",
    "    time_series = t + s + n\n",
    "    train_pdf[i] = time_series[:tt_split]\n",
    "    test_pdf[i] = time_series[tt_split:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "train_gdf = gd.from_pandas(train_pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statsmodels Model\n",
    "\n",
    "smES requires that the time series `endog` be one-dimensional -- thus, to forecast out-of-sample predictions for multiple time series (in our case, all the columns of our dataframe), we have no choice but to iterate over the columns and for each time series, initialize, fit, and forecast. We store each series prediction in `sm_preds`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fit / Forecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "sm_preds = np.zeros((n_series, n_preds))\n",
    "\n",
    "for i in range(len(train_pdf.columns)):\n",
    "    sm = smES(train_pdf[train_pdf.columns[i]], seasonal_periods=int(n_samples/4), seasonal='add')\n",
    "    sm = sm.fit()\n",
    "    sm_preds[i] = sm.forecast(n_preds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## cuML Model\n",
    "\n",
    "On the other hand, cuES allows for multi-dimensional input such as a cudf.DataFrame. When passed an entire dataframe, initialization, fitting, and forecasts for every series can be done simultaneously. These results are returned in a cudf.DataFrame, which we can cast to the same NumPy format as `sm_preds` by calling `as_matrix()`, and then to be row-major by calling `.transpose()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cu = cuES(train_gdf, seasonal_periods=int(n_samples/4), seasonal='add', ts_num=n_series)\n",
    "cu.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Forecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cu_preds = cu.forecast(n_preds).as_matrix().transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_arr = test_pdf.values.transpose()\n",
    "\n",
    "cu_r2_scores = r2_score(test_arr, cu_preds)\n",
    "sm_r2_scores = r2_score(test_arr, sm_preds)\n",
    "\n",
    "print(\"Average cuES r2 score: %s\" % np.mean(cu_r2_scores))\n",
    "print(\"Average smES r2 score: %s\" % np.mean(sm_r2_scores))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cuml4",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
